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Abstract 

Sin this paper we consider a semiparametric regression model involving a d- 
I , dimensional quantitative explanatory variable X and including a dimension re- 

duction of X via an index P'X. In this model, the main goal is to estimate the 
euclidean parameter (3 and to predict the real response variable Y conditionally 
^ to X. Our approach is based on sliced inverse regression (SIR) method and 

optimal quantization in L^'-norm. We obtain the convergence of the proposed 
estimators of /3 and of the conditional distribution. Simulation studies show the 
good numerical behavior of the proposed estimators for finite sample size. 

Keywords Optimal quantization, Semiparametric regression model. Sliced In- 
verse Regression (SIR), Reduction dimension 



X 



1 



1 Introduction 



In regression analysis, the main goal is to seek a parsimonious characte- 
rization of the conditional distribution of a response variable Y given a d- 
dimensional explanatory variable X. In many statistical applications, the di- 
mension d of X becomes large and therefore the statistical analysis becomes 
difficult. A usual approach to overcome this problem is to reduce the dimension 
of the explanatory part of the regression model without much loss of informa- 
tion on regression and without requiring a pre-specified parametric model. This 
has been achieved through the introduction of sufficient dimension reduction 
methods whose goal is to reduce the dimension of X by replacing it with a 
minimal set of linear combinations of X. 

In this paper, we consider the following semiparametric single index regres- 
sion model 

Y = fiP'X,e) (1) 

where the real response variable Y is linked, via an unknown link function /, to 
the d-dimensional random vector X only through the unknown d-dimensional 
parameter l3. The random variable e is an error term independent of X. Model 
Q can also be defined as F _L X\(3'X where "_L" stands for independence. This 
means that for the regression of F on X, a sufficient statistic is given by /3'X. 

Sliced inverse regression (SIR) introduced by Duan and Li (1991), principal 
hessian directions (see for instance Cook, 1998) or sliced average variance esti- 
mation (SAVE) developed by Cook (2000) are classical methods for identifying 
and estimating the linear subspace spanned by /3. Without additional assump- 
tion, only this subspace is identifiable in the model and is called the central 
dimension reduction subspace or the effective dimension reduction (EDR) space 
according to the considered approach. To estimate this subspace, SIR uses pro- 
perties of the conditional expectation of X given Y under mild assumptions 
on the distribution of X while SAVE is based on properties of the conditional 
variance of X given Y . In this paper, we focus on SIR approach which has be- 
come the most standard method in this area because of its simple and useful 
estimation scheme. The main idea is to divide the whole space of Y into slices 
and to consider the SIR matrix of interest defined as the covariance matrix 
of the conditional mean of X in each slice. More precisely, let E = Var(X) 
and let us assume that the range of Y is sliced into H non-overlapping slices 
Sh- Let Y = {h : Y G Sh} for h ~ 1, . . . ,H a discrete version of the conti- 
nuous response variable Y. Note that Y can be seen as the projection of Y on 
a rough grid. Under two assumptions (the first one concerns the distribution 
of X and the other one is in order to ensure that the regression model is not 
a known pathological one), it can be shown that the principal eigenvector of 
the matrix E~^r is coUinear to l3 where the SIR kernel matrix T is given by 
r = ELiP(^ e Sh){nX\Y e Sh] - E[X])(E[X|y e Sh] ~ E[X])' and then 
can be easily estimated by substituting empirical versions of all the moments 
for their theoretical counterparts. 

In this paper, we first propose to use optimal quantization in order to find 
an approximation of the SIR kernel matrix F. A brief panorama of optimal 
quantization is described in Section 2 in order to remind the principle of opti- 
mal quantization which is a key stone of our method. In section 3, we describe 
the estimator of the direction of /3. The basic idea is to replace X by X^ its 
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optimal quantizer in L''-norm taking a finite number N of values. Let us denote 
Fa, = Var(E[X^|f]). We show in Section 3 that Tn converges to f as 
goes to infinity and we control the rate of convergence. From this result, we will 
deduce the convergence of principal eigenvectors of the sequence (Fat) to the 
direction of /3. In practice, optimal quantization is frequently used to compute 
approximations of conditional expectations, see for instance de Saporta et al. 
(2010a) and de Saporta et al. (2010b). In Section 4 we propose to use optimal 
quantization to forecast Y given X, that is Y given f3'X under the considered 
model. We provide a theoretical result which specifies that the forecast error 
tends to zero as the numbers of quantizers tend to infinity. The corresponding 
method is particularly interesting since most of the papers on SIR in the litera- 
ture only focus on the estimation of the direction of P and do not consider the 
underlying regression model in its entirety. Few theoretical results combining 
SIR to estimate the central reduction space and nonparametric estimator of the 
link function are available in the literature, for instance Gannoun et al. (2004) 
is one of them. In Section 5, we illustrate our approach on simulation studies 
and we provide numerical results to illustrate the good behavior of the proposed 
estimator for finite sample size. All the proofs of the mathematical results of 
convergence are deferred in the Appendix. 

2 About optimal quantization 

Originally, the word "quantization" was used in signal and information theo- 
ries by engineers since the fifties. Quantization was devoted to the discretization 
of a continuous signal by a finite number of "quantizers". It is very useful to op- 
timize the position of the "quantizers" to have an efficient transmission of the 
signal. In mathematics, the problem of optimal quantization is to find the best 
approximation of the continuous distribution of a random variable by a discrete 
law with a fixed number of charged points. Firstly used for a one-dimensional 
signal, the method has been developed in the multi-dimensional case (see for 
instance Zador (1963) or Pages (1998)) and extensively used as a tool to solve 
problems arising in numerical probability. It is also frequently used to solve pro- 
blems in finance, as time optimal stopping, control or filtering (see for example 
Pages et al. (2004a), Pages et al. (2004b), Bally et al. (2005), Pages and Pham 
(2005)). More recently de Saporta et al. (2010b) used quantization in order 
to develop a numerical method for optimal stopping of Piecewise Determinis- 
tic Markovian Processes with an application to the optimization of reliability 
maintenance, see de Saporta et al. (2010a). 

Optimal quantization is well-adapted to the approximation of conditional 
expectation. In this paper we use it to tackle the estimation of the conditional 
distribution of Y given (3'X in the regression problem ([T]) . We will also specify 
how to use quantization in the estimation process of the SIR kernel matrix F. 

In the sequel of this section, let us first present the principle of the quantiza- 
tion method for a random vector X. Then we will provide a result on forecasting 
via quantization in nonparametric regression model. 
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2.1 Optimal quantization for a random vector X 

Let X be a random vector from a probability space P) to R''. We 

suppose that X is finite in L^-norm for some p > 1; that is to say|jX||p = 
E[|X|P]i/P is finite (for x G R'^, |x| denotes the EucHdean norm on R'^). The 
purpose of quantization is to approximate the continuous distribution of X by 
a discrete one with a finite support whose cardinahty is N . Let be an A^- 
grid of R''. Let Proj^^(x) be the point of 7jv which is the nearest one of x for 
Euclidean norm. The quantization error with respect to 7jv is 

Q^(Px)(7^) = ||X-Proj^^(X)||P. 

Existence (but not uniqueness) of an optimal iV-grid which minimizes (5^(Px)(') 
vanishing its gradient has been shown under the following assumption about 
Pjf : Pjf does not charge hyperplanes. From now on, for any random vector X 
in which verifies this assumption, let us denote the projection on an optimal 
iV-grid of X by X^ . Note that the vector X^ is a discrete random vector which 
verifies the following useful stationarity property : 

E[X|1^] ^ X^. (2) 

Some results about asymptotic quantization error have been obtained by 
Zador (1963). The following theorem (see Corollary IL6.7 of Luschgy and Graph 
(2000)) is a generalization of a result due to Pierce (1970). It gives the rate of 
convergence of the discrete approximation X^ to X in for great values of N 
and will be very useful in our progress. 

Theorem 1 If \\X\\p+s is finite for some 5 > 0, then there exist real numbers 
Di, Z?2, -D3 such that for all N > D3, we have 

ll^-^'^ll^<^(^iWt^ + ^2)- (3) 

2.2 Forecasting in a nonparametric regression model 

Let us consider here the following nonparametric regression model : 

Y = fiU,e), (4) 

where U : (fi, J^, P) — > R*^ is a random covariable, e is a random term inde- 
pendent of U, and / is an unknown real link function. We propose a method to 
forecast Y given U based on quantization approach. First we quantize U and Y. 
Let and Y'^ denote their optimal discrete approximations. We denote by P 
the transition matrix from to Y'^ , that is to say, if -f^ and 6n are optimal 
A^-grids for quantization of U and Y, 

Vii e 7w, yy e 6^, P{u, y) = P(y^ = y\u'' = u). 

Consider the discrete random variable Y'^ such that (i/^jy*^) is a stopped 
Markov chain with transition matrix P. We propose Y'^ as predictor for Y given 
U. Actually, for a fixed u, we forecast Y = f{u, e) by the conditional law of 
given U'^ = Proj^^(u). So in all the results concerning the forecast of Y, it will 
be equivalent to put Y'^ or Y^ . We specify in Theorem how for a fixed u, the 
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discrete law of probability given by P(Proj^^ (u), y) on the N points y of the 
grid i5jv, is a good approximation of the distribution of Y given U — u. Knowing 
this distribution, we propose to use E[y^|[/ = u] as a predictor of Y . Note that 
it is also easy to get conditional quantiles and a forecast interval for Y with a 
given confidence level. 

In order to get the theoretical result, let us introduce the following assump- 
tions : 

{Ai) 3 p> Is.t. U,Y eW, 

{a() 3 J > s.t. c/,r e 

{A2) 3 [/]i,p>Os.t. Vu,z;eR^ ||/Ke)-/>,e)||p < 

{Az) The distributions of U and Y do not charge hyperplanes. 

The following theorem gives the convergence of the conditional distribution 
of Y^ given to the conditional law of Y given U in the regression model . 
Actually, let (/) be a Lipschitz function ; we show that the L-'^-norm of E[(/)(y)|C/] — 
E[(/)(y^)|^7^] is bounded by a quantity involving the L^-errors of quantization 
of the variables Y and U. Thanks to Theorem [l] the forecast error decreases to 
as iV approaches to infinity with rate iV~^/'^. 

Theorem 2 For all Lipschitz function (j> with Lipschitz constant [(j)]Lip, under 
assumptions (^i_j.3), we have 



ATI, 



Moreover, if assumption (Ai) is replaced by assumption {A[), the rate of conver- 
gence is given by 

E[0(y)|c/] - E[^(y^)|;7^] ||^ = o[-^ 

The proof of this theorem is deferred in the Appendix A.l. 

In the next two sections, we will first estimate the linear subspace spanned 
by /3 in model ([ij using optimal quantization. Then, we combine the proposed 
estimator of the EDR direction with the previous forecasting approach based 
on optimal quantization in order to predict Y given X in regression model (jlj. 



3 Estimation method of the direction of (3 

Now, consider the semiparametric regression model (jlJ. Let us first remark 
that, since /3 and / are simultaneously unknown in this model, we can only 
identify the linear subspace spanned by f3. Let us also recall that Tjy denotes 
the covariance matrix of E[X^|y] where Y = Proj^(y) is the projection on a 
(non necessary optimal) grid 7 of R. Let (3^ he a principal eigenvector of the 
matrix E^^f at. The next result exhibited in Theorem [s] says that, for a large N, 
the direction of /S^r is a good approximation of the one of /3. Actually, we also 
give in Theorem [4] a stronger result : there exists a sequence (/Sat) of principal 
eigenvectors of S Fjv which converges to /3 when the number iV of quantizers 
of X goes to infinity. This result is due to the fact that the matrices T and r^v 
are very close for great N (see Lemma p] given in Appendix A.2.). 
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We need the following additional assumptions which are usually assumed in 
SIR framework : 

(A) 3y e 7, E[(X - y)'(i\Y = v]^Q, 

(Aq) X has an elliptically symmetric distribution. 

The next two assumptions enable optimal quantization of X : 

(A) 3 p > 1 s.t. X e LP n L« with i + i = 1, 

[As) The distribution of X does not charge hyperplanes, 
{Aq) 3 5 > s.t. X e LP+\ 

The next result gives the convergence of the direction of {f3j\j), for any se- 
quence {I3n) of principal eigenvectors of the sequence (S~^rjv) to the direction 
of /3 as the number of quantizer N tends to infinity. For this, we need to define 

W^)x(/3'/3) 

Theorem 3 Under (.45^8); ^^or any sequence {/3n) of principal eigenvectors of 
the sequence (E^^Fjv); we iave 

cos^(/3jv, /3) — 1 as N ^ 00. 

The proof of this theorem is deferred in the Appendix A. 3. For the next theorem, 
we recall that for x E IV^, \x\ denotes the Euclidean norm on R''. 

Theorem 4 Under (^5_j.8)j there exists a sequence of principal eigenvectors of 
E~^rjv denoted by {(3^) which converges to (3. Indeed , there exist real constants 
Ci , C2 > such that 

yx > C„ 1/3^ - /3| < ^Jl^-'WJX - X^l\\Xl. 

Moreover, under (Ag), we control the rate of convergence by 

The proof of this theorem is deferred in the Appendix A. 2. 



4 Forecasting method 

Now we mix the previous estimations to tackle the forecast of Y in the 
semiparametric regression model ([T|). For a given /3jv defined in Section |3] the 
model 

Y^fiP'^X,e) 

is a good approximation of the initial one. So, the method to forecast F in Q 
when U — /S'j^^X will give a good forecast of Y in model ([T]). For the asymptotics, 
we have to consider two parameters : the number N of quantizers of X which 
indexes (see Section [s]) and the number m of quantizers of U = P'j^X and 



Y (see Section 2.2). We will show the forecast error tends to as to and N 



simultaneously tend to infinity. 
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Using Theorem |4] there exist some Ao, Cq > such that for all N > Cq 

I/37V-/3I < Ao||X-X^||p. (5) 

We need some additional assumptions to get our asymptotic result. Let us as- 
sume that / is Lipschitz and optimal quantization in L^'-norm for Y is possible : 

(^lo) 3[f]up>0s.t.\fu,veK, \\fiu,e)- f{v,e)\\p<[f]up\u-v\. 
{All) Y e LP+'5. 

{A12) The distribution of Y does not charge hyperplanes. 

-m „ 

We forecast Y given X by the random variable such that [l3'j^X , Y'^) is a 

- s, 

stopped Markov chain with the same transition matrix as (/3^^ , F™) where 

P'j<fX and are optimal (in L^-norm) discrete approximations of /3^X and 
Y with m quantizers. 

Theorem 5 Under (^^5^.12), for all Lipschitz function (p, there exist three real 
numbers Ai, A2, A3, a sequence (gN) which admits a strictly positive limit and 
two integers m and N such that for all m >rn and N > N, we have 



^ Ai A2 A3 

- l^nJd + — 9N H ■ 

1 A'^/° m m 



The proof of this theorem is deferred in the Appendix A. 4. This theorem yields 
that the forecast error decreases to as and m go to infinity. 

5 Simulation study 

The aim of this simulation study is twofold. First, we are only interested in 
the estimation of the regression slope parameter f3, next we focus on forecasting 
Y given A = a; in the semiparametric regression model Q. In the first part, 
since only the direction of /3 is identifiable in ([ij , we evaluate the quality of our 
estimator with the square cosine of the angle between the true direction /3 and its 
estimates. The closer this square cosine is to one, the better is the estimation. For 
forecasting part of this study, we will compare the true conditional expectation 
and the true conditional variance of Y given X in model ([iJ to their estimations 
via sliced inverse regression and quantization. For this we need to ensure that /3 
and its estimates have the same norm and sign before quantizing the estimated 
index. 

In this simulation study, we consider the following three models : 



Y = {P'Xf + e, 

Y = {P'Xf+P'Xt, 

Y = (/3'A)2exp(^) +e, 



{Ml) 
{M2) 
{Ms) 

where X follows a d-dimensional normal distribution Af{0, Id) and e is standard 
normally distributed. The error term e is independent of the covariable X. In 
the first (resp. second) part of the simulation study, the dimension of X is 
d = 10 (resp. d = 4) with /? = (1, —1, 0, . . . , 0)'. The first and third models are 
homoscedastic while the second one is heteroscedastic. We have introduced the 
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parameter 9 in model (A^s) in order to point out the efficiency of our method 
even if the model is symmetric dependent. Indeed, 6 controls the symmetric 
dependency between the index P'X and the variable of interest Y . When 9 — 1, 
there is none symmetric dependency and SIR works well. Symmetric dependency 
appears as increases, it is moderate when 9 = 5 and strong when 9 = 10. In 
such cases, classical SIR fails to estimate the direction of f3, this is a known 
pathological situation for SIR. Our proposed approach will appear more robust 
in presence of symmetric dependent model. 

5.1 Estimation of the subspace spanned by (3 

Assume we have a sample {{Xi, Yi), z = 1, . . . , n} of random variables gene- 
rated by one of the three previous regression models. First the covariance matrix 
S of X is estimated by the empirical one of the Xi's and is denoted by S. Then 
X and Y are quantized in L^-norm by the usual algorithm given for example 
by Pages and Printems (2003). We get the corresponding two optimal approxi- 
mations X'^ and y™. Finally the covariance matrix Tn = Var(E[X^|y'"]) is 
calculated. 

We use two sample sizes n = 300 and n = 1000. Since the dimension of 
X is d = 10, the sample size n is rather small for the stochastic quantization 
algorithm in order to get optimal quantization grids ; thus we may get some not 
so optimal quantization grids. To overcome this failure, we will use the idea of 
the pooled slicing approach introduced by Saracco (2001). Let us quantize B 
times the variables X and Y with the same sample. So we get B estimations 
Ti , . . . , Tb of Tn and we work with the mean 

b=l 

Note that since the principal eigenvector of each S^^T;, is coUinear to (3, the 
principal eigenvector of T,~^T is also coUinear to (3. Let us now denote by /Sn 
a principal eigenvector of S~^T. We will use this vector /3jv as our estimate of 
the direction of /3. 

In the displayed results, we take B = 5 quantization grids. The numbers of 
quantizers for Y and X are respectively m = 5 and N G {20, 30, 50, 100, 200}. 

We generate 100 samples for each model and each sample size. For each 
simulated sample, we calculate the estimate and the corresponding quality 
measure cos^(/3, /Sat). 

We represent in Figure [l](resp. Figure [2]) the boxplots of the squared cosines 
according to the number N of quantizers for X for the various models when 
n = 300 (resp. n = 1000). We also compare our estimation method with the 
usual SIR method when the number H of slices is equal to 5 (that is equal to 
the number m of quantizers we used for Y in our estimation process). Clearly 
we observe that when the number N of quantizers increases the quality of the 
estimator increases too for all the models. Moreover not surprisingly when the 
sample size n becomes bigger, the squared cosines increase toward one. One 
can see that the classical SIR approach works better than our proposed esti- 
mation method for models (A^i), (A^2) and (A^s) with 9 = 1, that is when 
the underlying model favors SIR. We can however remark that the numerical 
performances of our estimator are close (resp. very close) to those obtained with 
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classical SIR when N = 200 and n = 300 (resp. n = 1000). When a symmetric 
dependency appears in the regression model, our approach outperforms classi- 
cal SIR : this point is more particularly obvious when = 5 or 10 for model 
(Ms) when n = 1000. Since in practice with a real dataset it is not possible to 
determine if the underlying regression model is symmetric dependent or not, we 
can expect that the use of our estimation method will provide a more robust 
estimation of the direction of (3 than SIR. 

5.2 Forecasting 

In this part, we consider samples of size n = 10000 generated from the pre- 
vious semiparametric regression models where the dimension of the covariable 
X is equal to d = 4. 

For each simulated sample, we first estimate /3 by the procedure described 
in the previous subsection. However we use only one quantization (B = 1) for 
the two variables X and Y because we work here with enough data for the 
quantization algorithm in order to get an optimal quantization grid. To do this, 
we use N = 200 quantizers for X and m = 5 quantizers for Y. Here we need to 
get an estimator Ppj of f3 and not only of its direction. Since we know the true 
slope parameter /3 in simulation, we can assume that the sign and the norm of 
/3 are known. 

Finally, in order to estimate the conditional distribution of Y given X, we 
quantize /3^X with ?7i = 100 quantizers and we also use m = 100 quantizers for 

— _ m 

the quantization of Y in this step. Then we get the law of F'" given . 
Now we can estimate the conditional expectation and the conditional variance 

m 

of Y given X by the ones of Y"^ given . 

In Table [l] (resp. |2|, we present some estimation results of the conditional 
expectation and the conditional variance of Y given X — (0.5, —0.5, 1, 0)' (resp. 
given X — (—1/3, 0.5, 1, 1)'). For each model, we compare our estimations with 
the true values of the conditional expectation and the conditional variance, and 
we evaluate the corresponding relative error. One can see that the estimated 
values obtained with our proposed method are very close to the true ones for 
the conditional expectation as well as for the conditional variance whatever the 
model is. The relative errors (in absolute value) are lower than 16% for most of 
the models. 

Finally, we generate ten values Xj from a uniform law on [—2,2]^. For each 
value Xj and for each model, we estimate the conditional expectation and the 
conditional variance of Y given X — Xj. Figure |6] gives the two boxplot of the 
relative error of these conditional moments for model {M3) with 9 — 5. One 
can observe that our estimation procedure provides reasonable values for these 
relative errors. We obtain very similar results (not given here) for the other 
models. 

In this subsection, we only focus on the first two conditional moments of Y 
given X. However, with the proposed approach, it is straightforward to make 
forecasting by using the conditional median for instance. In the same spirit, it 
is possible to estimate conditional quantiles (5% and 95% conditional quantiles) 
in order to provide the 90% predictive interval. 
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6 Concluding remarks 



We have presented a method using the probabiHstic tool of quantization in 
order to tackle both the problem of the estimation of the EDR direction and the 
forecasting of Y given X by its conditional law in the semiparametric regression 
model Q. We proved the convergence of our estimators and gave their rate of 
convergence. To our knowledge, using of optimal quantization in nonparametric 
or semiparametric statistics is new and forecasting the variable of interest by its 
conditional distribution with this kind of appraoch is original. The simulation 
studies give good results for large samples (n = 1000) and in this case, the 
performance is comparable to the one of the SIR method. It is even better when 
a symmetric dependency appears in the regression model. In practice with a 
real dataset, we do not know if the underlying regression model is symmetric 
dependent or not, so we can expect that using of our estimation method will 
provide a more robust estimation of the direction of /3 than SIR. 

Acknowledgement Professor Frangois Dufour gave us the main idea of this 
paper that is to use together quantization and SIR methods for the problem of 
dimension reduction and forecasting in model ([ij . We are most grateful to him 
for this suggestion. 
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Conditional expectation 





Model (Ml) 


Model (yVfa) 


Model (A4;0 


Model {M:i) 


Model {M:i) 








with (J = 1 


with = ') 


with = 10 


true value 


1 


1 


2.72 


1.22 


1.11 


estimation 


1.07 


0.84 


2.56 


1.02 


1.11 


relative error 


0.07 


-0.16 


-0.06 


-0.16 


0.00 






Conditional variance 








Model (Ml) 


Model iM2) 


Model (Ms) 


Model (Ma) 


Model (Ms) 








with 9 = 1 


with 9 = 5 


with 6* = 10 


true value 


1 


1 


1 


1 


1 


estimation 


1.01 


0.92 


1.14 


1.05 


1.07 


relative error 


0.01 


-0.08 


0.14 


0.05 


0.07 



Table 1 - Forecasting of Y given X = (0.5, -0.5, 1, 0)' 



Conditional expectation 





Model (Ml) 


Model {M2) 


Model {M3) 


Model (Ms) 


Model {M3) 








with 9 = 1 


with 9 = 5 


with 6* = 10 


true value 


-0.56 


-0.58 


0.30 


0.59 


0.64 


estimation 


-0.64 


-0.51 


0.52 


0.80 


0.61 


relative error 


0.11 


-0.12 


0.71 


0.36 


-0.05 






Conditional variance 








Model {Ml) 


Model {M2) 


Model (Ms) 


Model (Ms) 


Model (Ms) 








with 9 = 1 


with 9 = 5 


with 6* = 10 


true value 


1 


0.69 


1 


1 


1 


estimation 


1.01 


0.80 


0.77 


1.01 


1.09 


relative error 


0.01 


0.15 


-0.23 


0.01 


0.09 



Table 2 - Forecasting of Y given X = (-1/3, 0.5, 1, 1)' 
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Estimation for the model 3 (Theta=5) 




Conditional expectation 



Conditional variance 



Figure 3 - Relative error in the estimation of the conditional expectation and 
variance 

A Proofs 

A.l Proof of Theorem [2] 

Let us define F{u) = E[<j){Y)\U = u] and F{u) = E[(t){Y)\U = u] . By 
triangle inequality, we have 

\\F{U)~F{U)\l < \\F{U)^F{U)\l + \\F{U)~E[q^{Y)\U]\\^ 

+ \\E[q^{Y)\U] - F{U)\l. 

Furthermore, using Lipschitz property of and / and the independance between 
U and e, we have 

VM,i;eR^ \F{u)-F{v)\ < [4>]up[f]up\u - v\. 

Then by LP-contraction property of conditional expectation, we get 

||F(;7) - E[</>(y)|C/] < \\FiU) - F{U)\l < [cj)]Lrp[f]up\\U - u\l. 

Moreover since F{U) - E[(j){Y)\U] = E[0(f) - (l>iY)\U] , thus using again L^- 
contraction property of conditional expectation and Lipschitz property of (j), we 
obtain : 

WHu) - ^HY)\u] lip < ||0(y) - HY)\l < mup\\Y - Y\l. 

This yields the expected inequality. □ 

A. 2 Proof of Theorem IH 

Before proving Theorem |4] we first give in Lemma [6] a useful bound of the 
distance between F and Fjv- 

Lemma 6 Under (As^w), we have ||f - f Ar||oo < 2d||X - X^\\p\\X\\q. 
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Proof Let An = f — T^. We control all the terms of this symmetric matrix 
hereafter. 

- Study of the diagonal terms. For all 1 < i < d, we have 
A^ii^i) = Var[E[X|y],] - Var[E[X^|f],] 

= E[E[X|y]?] -E[X]2-E[E[l^|y]?] +E[1^],2. 
Using the stationary property we obtain 
An{i,i) = E[E[X|y]2] -E[E[l^|y]2] 

= E[(E[X|y], - E[X^\Y],){E[X\Y], + E[X^\Y], 

By Holder inequality, L^' and L'-contraction property of conditional ex- 
pectation and stationary property we get 

\AN{i,i)\ < ||E[X-l^|f]JM|E[X + l~|f].|l 



< \\{x~x''Up\\ix + x' 

< \\x-x^U\\x\\, + \\x^\\,) 

< 2\\X-X^\\,\\X\\,. 

Study of the non-diagonal terms. For all G {1, . . . , d}^ such that 
i j, we have 



Gov 
E 



E[X\Y]„E[X\Y], 



Gov 



E[X''\Y],,E[X''\Y] 



E[X\Y],E[X\Y], - E[X^|y],E[X^|y]j 
E[X\Y], - E[l^|f],) (e[X\Y], + E[X^\Y], 



= E 

+E[E[X^\Y],E[X\Y],] -E[E[X^\Y]jE[X\Y],]. 
By symmetry, we also have 



E 



'_(E[X\n - E[X^|f],) {E[X\n + E[l^|f] 
+E[E[X^\Y],E[X\Y],] - E[E[X^\Y],E[X\Y], 
Then summing the two latest formulae, we get 

2AN{i,j) = 



E[X\Yl - E[X^|f],) (e[X\Y], + E[X^|y], 
+E (e[X\Y]j - E[X^\Y]j'j {E[X\Y]i + E[X^|f] 



As for diagonal terms, by Holder inequality and L^* and L'?-contraction 
property of conditional expectation, we obtain 

\An{i,j)\ < 2\\X-X''U\X\\,. 

Finally, we obtain 

d 

UnWoo = max V |Aw(z, j)| < 2d\\X - X^\\p\\X\\g, 

l<i<d ^ — ^ 



3 = 1 

and the proof of the lemma is complete. □ 
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Lemma [g] yields that the eigenvalues of S ^Fat tend to the eigenvalues of E 
as N goes to infinity. Particularly, if p and pjv denote the principal eigenvalue 
of S^if and T.-'^Tn, we have 



Pn 



p> as N 



Consider zpf the principal eigenvector of norm 1 of E ^f^v such that zn — 
a^P + where zj^ J- f3 and > 0. Thus we have 

= E-l{(fAr-f)zAr+f(aAr/3 + Z]^)}. 

From SIR theory, the rank of F is 1 and /3 is a principal eigenvector of E^^F. 
Thus the kernel of E^^F is (d— l)-dimensional and contains z^. Consequently, 
we get pnZn ~ o-npI^ = E~^(rjv — i')zN- Now, let us show that the sequence 
(aAr)jv>i has a strictly positive limit. We have 



\pnzn - aNpPl < ||E M 



■ N 



F as TV ^- oo. 



Since (pn) has a nonzero limit and xjv is norm 1 for any N, we have lim = 
|-^. Let A^i be an integer such that for all > A^i, ctAr > 0. Let C — 
minAr>Arj ttAT. Moreover for all N > Ni, let /3jv = _JLP]^^ Thus for all N > Ni, 

OlN P 

we have 



\Pn-P\ 



< -||E-H 



I Moo 

Thanks to Lemma [6] and Theorem [T] we obtain the expected result. □ 
A.3 Proof of Theorem M 

Let us consider the sequence (Pn) defined in the previous proof. Then for all 
N > Ni, there exists some A at such that Pn = ^nPn and 

cos2(/3jv,/3) = cos2(/3jv,/3) 
2 |/?|2 

■ 1 as — >■ CX3. □ 



Pn 
P 



A. 4 Proof of Theorem I 



Let Yn — fiP'^X, e) and let Y/v be its projection on an optimal m-grid in 
LP-norm. First, we show that the forecast error is smaller than a sum of four 
terms given in Lemma [7] Then we control each term to complete the proof. 

Lemma 7 For any Lipschitz function (p, we have, for all N and m, 



E[cj,{Y)\p'X] 



E 



< m]up[^Up\\X\\p\l3 - 



Lip 



N 



p'^x - p'x 
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Proof Let V = /3'X, Vat = P'^X and F(v) = ^[(j){Y)\V = i]. Thus, forecast 
error can be written this way : 



F{V) - F{Vn) 
F{Vn)-^[<I>{Y)\P'^X] 

^[<P{Y)\p'j,X] ~^[^{Yj,)\p'^X] 

r . ~- Yd - — - T 



<P{Yn )Wj,X 



E 



(^{YnW^X 



E 



We will provide a majorant for each term in L^'-norm. 
For the first one, we get 

\\F{V) - F{yN)\\p < [^]L^p[f]L^p\\V-VN\\p 

< ['l^]Lrp[f]up\\X\\p\l3 ~ /3m\. 

For the second term, since a{j3'^X) C cr{X), we have 

F{Vn)^^^{Y)\P'nX] = I][F{VN)-n^iY)\Xp'j^X . 

Since E[(/)(y)|X] = E[0(y)|/3'X] = F(y) from model assumption, we obtain by 
LP-contraction property of conditional expectation : 

\\F{VN)~E[<j>{Y)\l3'^X]\\^ < \\F{Vn) ~ F{V)\\p 

< [<f>]L^p[f]L^p\\X\\p\P ^ PnI 

For the third term, we get 



< 



|0(r)-0(rjv)||p 



B[^iY)\f5'j,X] -E[<j,{YMX] 

< [<l>]L^p[f]L^p\\X\\p\P - (3n\- 

For the fourth term, we apply Theorem [2] to Yn ~ e) and we obtain 



E[0(rjv)|^^^] -E 



[Y, 



N 



For the fifth term, we have 



E 



E 



cl^iY^Wj^X 



< 



Finally for the last term, we obtain 



E 



(biY^Wj^X 



- E[<PiY"')\/3'j,X 

< MYn) - <f>{Y)\\p 

< [0]L.p||>V-i^||p- 



'2[4>]Lip[f]Lip\\l3'NX - P'j^X 
[(l)]Lip\\YN ~Yn lip. 

< ||0(yAr)-0(l^'")||p 

< [4']up\\Yn-Yn"'\\j^. 
\cf,(Y) - <t>{Y"^)\\, 



\ Lip 



lY-Y^' 



< ['l^]L^p[f]L^p\\X\\p\P - Pn\ + ^L^pWY - Y"%. 

Summing these six inequalities yields the expected result. □ 
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The rest of the proof of Theorem [5] splits into four parts. 

(i) Using ^ and Theorem [l] since ||X||p_(_5 < oo there exist Di, D2, such 
that VA^ > max(Co, D3), we have 



\^-M<^,{d,\\x\CUd.} 



1/p 



(6) 



(ii) Using Theorem [T] again, there exist Ci, C2, C3 such that Vm > C3, we have 

i/p 



Yn~Yn 



By triangle inequality, we get 

\\Yn\\p+s < \\Y\\p+s + \\Y -Yn\\p+s 

< \\Y\\p+s + [f]up\\X\\p+s\P~/3N\ 

< \\Y\\p+s + [f]L^p\\X\\p+5Ao\\X - X^'Wp forN>Q 



From for all N > niax(Co, Ds) we have 



N 



\\P+S 
P+S 



< 



{\\Y\\p, 



A1A2 -1 P+'5 



where Ai - A„[f]L,p\\X\\p+s and A2 = ll^ll^^' + ^2) Finally 
Vto > C3 and ViV > max(Co, -D3), we have 



Yn-Yn 



< 

p m 



1 r / Ai A9 \ 1 i/p 



(iii) Using Theorem [T] again, there exist C^, C2, C3 > such that Vm > C3 
P'r^X-fi^x'' < -{c'MXlllXUc'y' 

By ([5]) we get 

ViV>Co, \(3n\<\P\+Ao\\X-X^\\ . 



i/p 



Using (Isl, we have VA^ > max(Co, D3 



13'j^X - p'X 



< 

p m 



-{c[\\x\\ 



P+S( AqA2 



p+S 1/p 

+ C^} . (8) 



(iv) Using Theorem [T] again, there exist Ci,C2,C'^ such that thanks to y e 

(9) 



LP+^ for all m > C^', we have 



\Y-Y"^\\p<-{CnY\\lll + C'^} 



1/p 
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Plugging ([6| , Q , Q and ([9| in the inequality of Lemma [t] we have Vto > 
max(C3, C^, C^') and VA^ > max(Co, D^), 



E[0(r)|/3'X] - E (/.(f")|/3^X 
< 



i/p 



1 r / Ai Aq \ p+'5 1 i/p 

+ -2[0]L.p{ci(^ + ||y|U.) 

+ l^[f]LmL.,{c[\\x\\ixi[^ + my^' + c^y^' 
+ i{cnYC4+c^y''. 

This achieves the proof. □ 
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